---
title: Information System DB Design
description: >
  Things to keep in mind when designing a database for an information
  system
categories: posts
tags: [database]
---

<details markdown="1" id="table-of-contents">
<summary>
Table of Contents
</summary>

* TOC
{:toc}
</details>

## Requirements analysis

Although the focus of this article is DB design here are some things to
consider from the requirements:

- Purpose of the information system.
- Features based on needs, never on assumptions.
  + "must be able to..."
- Go from open to close questions.
- Describe user roles, and privileges.


## Data Modeling

Identify

- required data
  + draft/review mandatory input forms, and reports
- problem domain concepts

Model

- organize concepts as sets of attributes
- an instance of a concept is
  + a tuple of attribute values
  + visualized as a row.
- fields collected according to a tuple of attributes
  + make a relation
  + are generally visualized as tables
  + are shown as columns, for single attributes


### Table design

A table may stand for

- a concept with definite attributes. eg. name -> 1st, middle, last
- an attribute with indefinite descriptors. eg. phone -> work, mobile, etc

Also, consider

- extracting redundant attributes into a new table
- using associations for composite concepts

#### Test designs

From the first schema implementation we should test our table, and
associations designs using sample/fake data.

- check all operations work as expected
- define associations
- improve ease of operation

We should iterate over table, and association designs to avoid modifying
their attributes once in production.

Relevant Sequel pages:

- [Migrations](https://sequel.jeremyevans.net/rdoc/files/doc/migration_rdoc.html){:rel="nofollow noreferrer noopener"}
- [Schema Modification](https://sequel.jeremyevans.net/rdoc/files/doc/schema_modification_rdoc.html){:rel="nofollow noreferrer noopener"}
- [Testing with Sequel](https://sequel.jeremyevans.net/rdoc/files/doc/testing_rdoc.html){:rel="nofollow noreferrer noopener"}

Also, since test data hardly ever produces the best optimizations for
real data we should avoid creating indexes, and the like, during testing.


### Associations

Like DB tables, associations also stand for concepts in the problem
domain. They are used to simplify the code that deals with related tuples
in separate DB tables.

We can only associate DB tables which share key attributes.


#### Keys

We enforce uniqueness by assigning an attribute as the key. Whenever we
need to use more than one attribute to differentiate each tuple we call
it a composite key.


##### Primary key

A table's unique identifier is called primary key. To assign a primary
key to a table we need to identify (in order)

- super keys. all combination of attributes that uniquely id a record.
- candidate keys. super keys which are
  + irreducible
  + always unique
  + never NULL.
  + immutable.
- primary key. The candidate key with least attributes.
  + should not expose sensitive data.
  + prefer single attribute keys over composite ones.
  + artificial/surrogate key. unique auto-generated random id.
  + natural/business key. From among existing attributes.


##### Foreign key

On every association, one of the tables has to hold the primary key of
the other. Although is meant to connect tuples in different tables
it's called foreign key.

To avoid orphaned tuples across tables, whenever we delete a tuple from
any of the tables in an association we must either:

- Disallow changes.
- Update the tables in the relationship.
- Hide, and flag connected tuples as deleted
- Delete connected tuples


#### Many-to-One

Two tables are in a many-to-one (`*-1`) association when zero, one, or
more tuples in one table are associated to a single tuple in a different
table.

```text
| albums     |               | artists |
|------------|               |---------|
| :id        |       /---1-  | :id     |
| :artist_id | -*---/        | :name   |
| :name      |
```

Using Ruby's Sequel toolkit

```ruby
class Album < Sequel::Model
  many_to_one :artist
end
```

#### One-to-Many

Two tables are in a one-to-many (`1-*`) association when every tuple in
one table is associated with zero, one, or more tuples in another table.

```text
| artists |              | albums     |
|---------|              |------------|
| :id     | -1---\       | :id        |
| :name   |       \--*-  | :artist_id |
                         | :name      |
```

This association is complimentary to many-to-one. Using Sequel,

```ruby
class Artist < Sequel::Model
  one_to_many :albums
end
```


##### One-to-One

A one-to-one (`1—1`) association may be necessary when we have to split a
table into two or more tables for security or performance reasons.

```text
|  users  |              |  phrases |
|---------|              |----------|
| :id     | -1---\       | :phrase  |
| :email  |       \--1-  | :user_id |
```

This is considered a subset of the one-to-many association.

```ruby
class Artist < Sequel::Model
  one_to_one :album
end
```


#### Many to Many

A many-to-many association (`*-*`) allows each tuple in a table to be
associated to many tuples in an associated table, and _vice versa_.

```text
| albums     |               | artists |
|------------|               |---------|
| :id        |       /---*-  | :id     |
| :artist_id | -*---/        | :name   |
| :name      |
```

To satisfy both conditions, this association is broken into two
one-to-many associations using a joined table to associate them.

```text
| albums |          | albums_artists |          | artists |
|--------|          |----------------|          |---------|
| :id    | -1-\     | :album_id      |     /-1- | :id     |
| :name  |     \-*- | :artist_id     | -*-/     | :name   |
```

Using Sequel, and assuming the `albums_artist` table exists

```ruby
class Artist < Sequel::Model
  many_to_many :albums
end
```

##### One-through-one

The one-through-one association sets up a one-to-one association through
a joined table.

The difference between the one-through-one, and many-to-many association
is that in the former there can only be 0 or 1 records in the associated
table.

This association is useful for

- enforcing unique references, in the join table, to one of the tables.
- ordering the association, and only return the first record.

Using Sequel

```ruby
class Artist < Sequel::Model
  one_through_one :album
end
```


### Normalization forms

Normalization is a way to organize data. There are various normal forms,
each dependent on the previous one.

#### 1st Form

- every attribute field can only hold a single a value
- eliminate multiple values per field
- assign a primary key
  + combination of fields/attributes that uniquely id a tuple/record
- extract the attributes whose field values cause inner tables

It's usually easier to apply the 1st form in this order

1. eliminate multi-values
2. eliminate repeating groups
3. assign primary key

#### 2nd normal form

All attributes must be related to the primary key. Extract any attribute
which doesn't refer to the primary key, into it's own table.


#### 3rd normal form

Remove transitive dependencies:

- Ensure none of the non-PK attributes depends on a non-PK attribute.
- Remove attribute values based on other attributes. Instead, generate
  them dynamically.


#### More normalization

There are other four normal forms. Beware, the more we normalize, the
more we decompose our data into smaller tables. Which could make it
difficult to write queries due to multiple `Join`.

- Boyce-Codd
  + extract all non-key attributes that don't depend on each candidate key
- Symmetric Constraint (4NF)
  + ensure all attributes of equal importance are part of the CK
- Projection-Join (5NF)
  + Break the CK into associations
- Irreducible Relations (6NF)
  + Break relations into the smallest set of attributes
  + Mostly used for interval attributes

_Although we make reference to a composite key in forms 4 and 5, at that
point those should be the only attributes in the relation._

## Data Integrity

Some of the techniques we can use for data integrity are:

- Define each attribute's set of permitted values
- Constraint attributes types
- Trigger a chain of mandatory procedures in response to an event.

Sequel relevant documentation:

- [Model Validations](https://sequel.jeremyevans.net/rdoc/files/doc/validations_rdoc.html){:rel="nofollow noreferrer noopener"}
- [Constraint Validation Extension](https://sequel.jeremyevans.net/rdoc-plugins/files/lib/sequel/extensions/constraint_validations_rb.html){:rel="nofollow noreferrer noopener"}
- [Constraint Validations Plugin](https://sequel.jeremyevans.net/rdoc-plugins/classes/Sequel/Plugins/ConstraintValidations.html){:rel="nofollow noreferrer noopener"}

### Transactions

A transaction is the execution of several operations considered a unit of
work. A transaction is consider reliable when it has these properties

- Atomicity. All operations must succeed or nothing changes.
- Consistency. Data moves from one valid state to another.
- Isolation. DB state after executing a transaction concurrently or
  sequentially is the same.
- Durability. Changes should be persistent.

Sequel relevant links:

- [Database Transactions](https://sequel.jeremyevans.net/rdoc/files/doc/transactions_rdoc.html){:rel="nofollow noreferrer noopener"}

## Advanced features

Once the system has been in production for a while we should analyse DB
statistics to improve the DB performance. Consider using features such as

- index
- functions
- views
- prepared statements
- windows

Keep in mind that some DBMS, such as postgres, provide details of their
implementations so we can use the best one according to our needs.

Sequel relevant links

- [Prepared Statements](https://sequel.jeremyevans.net/rdoc/files/doc/prepared_statements_rdoc.html){:rel="nofollow noreferrer noopener"}
